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Abstract 

Descriptional complexity is the study of the conciseness of the var¬ 
ious models representing formal languages. The state complexity of 
a regular language is the size, measured by the number of states of 
the smallest, either deterministic or nondeterministic, finite automa¬ 
ton that recognises it. Operational state complexity is the study of 
the state complexity of operations over languages. In this survey, we 
review the state complexities of individual regularity preserving lan¬ 
guage operations on regular and some subregular languages. Then we 
revisit the state complexities of the combination of individual oper¬ 
ations. We also review methods of estimation and approximation of 
state complexity of more complex combined operations. 


1 Introduction 

Automata theory is one of the oldest research areas in computer science. 
Much research has been done on automata theory since 1950’s. Work in 
many subareas of automata theory is still ongoing these days due to its new 
applications in areas such as software engineering, programming languages, 
parallel programming, network security, formal verification and natural lan¬ 
guage and speech processing |143L I152LI145LI171L113211177j . 
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Descriptional complexity and, in particular, state complexity is one of 
such active subareas. Generally speaking, the study of complexity mainly fo¬ 
cuses on the following two kinds of issues: time and space complexity issues, 
i.e. time and space needed for the execution of the processes; or descriptional 
complexity issues, i.e. the succinctness of the model representations [179j . In 
general, having succinct objects will improve our control on software, which 
may become smaller, more efficient and easier to certify. 

State complexity is a type of descriptional complexity based on the finite 
machine model, and, in the domain of regular languages, it is related to the 
basic question of how to measure the size of a finite automaton. For the 
deterministic finite automaton (DFA) case, the three usual answers are: 
the number of states, the number of transitions, or a combination of the 
two m- For a complete DFA, whose transition function is dehned for 
every state and every possible input symbol, the number of transitions is 
linear with the number of states, for each hxed alphabet. Thus, the number 
of states becomes the key measure for the size of a complete DFA. When 
considering the descriptional complexity of nondeterministic hnite automata 
(NFA), because this notion of completeness is not present, the measures 
based on the number of states and on the number transitions, are much 
more loosely related. 

Since a regular language can be accepted by many DFAs with different 
number of states but only by one unique minimal, complete DFA, the deter¬ 
ministic state complexity of a regular language is dehned as the number of 
states of the minimal, complete DFA accepting it. If we replace the minimal, 
complete DFA with minimal NFA, we have the dehnition of nondeterministic 
state complexity. Since state complexity is used as a natural abbreviation 
of deterministic state complexity by most researchers working in the area, 
we also follow the convention in this paper. 

Complexity can be studied in two different ffavours: in the worst case |179j 
and in the average case |147j . The worst-case complexity of a class of regular 
languages is the supremum of the complexities of all the languages in the 
class |179j whereas the average-case complexity, it is the average value of the 
complexities of those languages. Although its evident practical importance, 
there is still very few research on average-case state complexity. For that 
reason, in this paper, we mainly review worst-case results. 

Results on descriptional complexity can be, roughly, divided into rep¬ 
resentational (or transformational) and operational. Representational com¬ 
plexity studies the complexity of transformations between models, by com¬ 
paring the sizes of different representations of formal languages For 

example, given an n-state NFA for a regular language, the DFA which is 
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equivalent to it has at most 2"' states, and this result, established in 1957, 
is considered the first state complexity result |157| . Operational state com¬ 
plexity studies the state complexity of operations on languages. When we 
speak about the state complexity of an operation on regular languages, we 
mean the state complexity of the class of resulting languages from the opera¬ 
tion |179j . For example, when we say the state complexity of the intersection 
operation on two regular languages, accepted by m-state and n-state DFAs, 
respectively, is mn, we mean that mn is the worst-case state complexity of 
the class of regular languages that can be represented as the intersection 
of an m-state DFA language and an n-state DFA language. Note that this 
implies that the intersection of any m-state DFA language Li and n-state 
DFA L 2 language has a DFA with at most mn states (upper bound) and 
that there exist languages Li and L 2 such that the minimal DFA for Li n L 2 
has exactly mn states (lower bound). 

In this survey, we mostly concentrate in operational state complexity 
results. Although hrst studies go back to the 1960’s and 1970’s, research in 
the area has been most active in the last two decades. This can be partially 
explained by the fact that back then, descriptional complexity issues were 
not a priority for applications, as they are today. But, also, due to its 
combinatorial nature many of the current research is only possible with the 
help of new high-performance symbolic manipulation software and powerful 
computers [SB] . 

The paper is organized as follows. After some preliminares in the next 
section, the notions of deterministic and nondeterministic state complexity 
are considered in Section To better understand the possible gap between 
both measures is a main topic of research. In Section]^ we review the state 
complexities of individual regularity preserving language operations, like. 
Boolean operations, catenation, star, reversal, shuffle, orthogonal catena¬ 
tion, proportional removal, and cyclic shift, etc. These individual operations 
are fundamental and important in formal languages and automata theory 
research and applications. Results in these two sections are given for dif¬ 
ferent classes of (sub)regular languages, e.g. general inhnite, finite, unary, 
star-free, etc. In Section]^ we revisit the state complexities of combined op¬ 
erations which are combinations of individual operations, e.g., star of union, 
star of intersection, star of catenation, star of reversal, union of star, in¬ 
tersection of star, etc. The state complexities of most of these combined 
operations are much lower than the mathematical composition of the state 
complexities of their component individual operations. We also review the 
methods of estimation and approximation of state complexity of combined 
operations which can be used for very complex combined operations. Sec- 
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tion concludes this survey with some discussion on the results presented, 
highlighting some open problems and directions of future research. 

2 Preliminaries 

Here we recall some basic definitions related to finite automata and regular 
languages. For a more complete presentation the reader is referred to |178) . 

The set of natural numbers is denoted by N and for i,j G N, [i,j] = 
{xGN|i<x<j}. The power set of a set S is denoted by 2'^ and 
the cardinality of a finite set S is IS*!. In the following, S stands always 
for a finite alphabet, the empty word is represented by e and the set of all 
words over S by S*. A language is a subset of S*. We say that L C S* 
is a unary (respectively, binary, ternary) language if |S| = 1 (respectively, 
|S| = 2, |S| = 3). Note this dehnition does not require that all symbols 
of S actually appear in words of L and hence every unary language is also 
a binary language and a binary language is always a ternary language. A 
language L is said to be hnite if L is a finite subset of S*. 

A nondeterministic finite automaton (NFA) is a tuple A = (Q, Ti,S,qo, F) 
where Q is a hnite set of states, S is a hnite alphabet, <5 : Q x S —)• 2*^ is 
the (multi-valued) transition function, go £ Q is the initial state and F C Q 
is the set of hnal (accepting) states. The transition function is extended as 
a function (5 : Q x S* —>• Q by setting 6{q, e) = q foi q G Q and for w G S*, 
X G S, 6{q, wx) = 5{5{q, w), x). To simplify notation, we denote 6 by 6. The 
language recognized by the NFA A is L{A) = {rc G S* | S{qo, w) Ci F 0}. 

An NFA A = {Q, S, 6, qo, F) is a complete deterministic finite automaton 
(DFA) if the transition function 5 is one-valued, that is, <5 is a function 
Q xT, —>■ Q. An incomplete DFA allows the possibility that some transitions 
may be undehned, that is, <5 is a partial function Q x T, ^ Q. 

Both the DFAs and the NFAs dehne the class of regular languages [178] . 
It is well known that any regular language has a unique minimal (complete 
or incomplete) DFA, that is, a unique DFA with the smallest number of 
states. For a given regular language the sizes of the minimal, complete DFA 
and minimal, incomplete DFA differ by at most one state. Furthermore, for 
a given DFA there exists an n log n time algorithm to compute the minimal 
DFA [178] . On the other hand, for a given regular language there may be 
more than one minimal NFA and NFA minimization is PSPACE-hard [941 
EZH]. 
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3 State Complexity and Nondeterministic State 
Complexity 

The state complexity of a regular language L, sc{L), is the number of states 
of its minimal DFA. The nondeterministic state complexity of a regular lan¬ 
guage L, nsc{L), is the number of states of a minimal NFA that accepts 
L. Since a DFA is in particular an NFA, for any regular language L one 
has sc{L) < nsc{L). It is well known that any m-state NFA can be con¬ 
verted, via the subset construction, into an equivalent DFA with at most 2”* 
states (we call this conversion determination). Thus, sc{L) < 

To show that this upper bound is tight one must exhibit a family of lan- 



Figure 1; Moore (i), Lupanov (ii), and Meyer & Fischer (iii) minimal m-state 
NFAs with equivalent minimal 2™-state DFAs 

guages {Lm)m>i such that nsc{Lm) = m and sc{Lm) = 2”^, for every m > 1. 
In 1963, Lupanov |133j showed that this upper bound is tight using a family 
of ternary languages. In 1971, Moore m and Meyer and Fischer m 
presented different families of binary languages. All three families of NFAs 
are represented in Figurej^ However, for unary languages that upper bound 
is not achievable [TMH1H3]. Chrobak [42 1 143] proved that if L is a unary 
language with nsc{L) = m, then sc{L) = 0{F{m)) where 

F{m) = max{lcm(xi, ... ,xi) \ xi,... ,xi > 1 and xi x; = m} (1) 

is the Landau’s function and 1cm denotes the least common multiple. It is 
known that F{m) = ^ sc{L) = e9(V™Tiim)_ asymptotic 

bound is tight, i.e., for every m there exists a unary language Lm such that 
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nsc{Lm) < rn and sc{Lm) = F{m — 1). Other related bounds were studied 
by Meregethi and Pighizzini mi- 

m 

For a general finite language L, if nsc{L) = m then sc{L) = 0(A;i+'°g'=), 
A: = |S| > 1, and this bound is tight USE]. In the case of finite binary 
languages, 0(2^) is a tight bound. In 1973, Mandl |135) had already proved 
that, for any finite binary language L, if nsc{L) = m then sc{L) < 2-2™'/^ —1 
if m is even, and sc{L) < 3— 1 if m is odd, and that these bounds are 
tight. Finally, for finite unary languages, nondeterminism does not lead to 
significant improvements. If L is a finite unary language with nsc{L) = m, 
then sc{L) < m + 1 |135l 1168) . 


In Section 4.3 the state complexity of determination of other subregular 
languages is reviewed. As it will be evident from the results in the following 
sections, the complexity of determination plays a fundamental role in the 
operational complexity and thus the importance of its study per se. 

The possible gap between state complexity and nondeterministic state 
complexity for general regular languages lead to the notion of magic number 
introduced in 2000 by Iwama et al. [Ml ini]. A number a, such that 
a G [m, 2"*], is magic for m with respect to a given alphabet of size k, if there 
is no minimal m-state NFA whose equivalent minimal DFA has a states. 
This notion has been extensively researched in the last decade and has been 
extended to other gaps between two state complexity values |134l WA 11051 
Eg EDI EH EQi [Ml EIH 188] . We summarize here some of the obtained 
results. The general observation is that, apart from unary languages, magic 
numbers are hard to find. For binary languages, it was shown that if a = 
2™ - 2^ or a = 2™ - 2” - 1, for n G [0, m/2 - 2] [M], and a = 2”" - n 
for n G [5,2m — 2] and some coprimality condition holds for n jlOlj . then 
a is not magic. Also, for a binary alphabet, all numbers a G [m, m + 
2l™'/3l] have been shown to be non-magic [Ml, which improves previous 
results, [m, m^/2] |105| and [m, 2^^] [69]. For ternary or quaternary regular 
languages, and for languages over an alphabet of exponential growing size 
there are no magic numbers [lOh] 110411109] fTTT] . For the unary case, however, 
trivially all numbers between e(i+o(i))V"*inm 2 ™^ are magic |TMI42lEn]. 
Moreover, it has been shown that there are much more magic than non-magic 
numbers in the range from m to e(i+o(i))^^ Inm 

In the case of finite 

languages, partial results were obtained by Holzer et al. [88]. All numbers 
a G [m-|-l, + ^ + ^], if even, and a G [m-|-l, (^^^)^-|-m-|-l], if m is 

odd, are non-magic. Moreover, all numbers of the form 3-2^“^-|-2* —1, with 


m + l 


" m—1" 


are 


m even, and 2 2 -g 2 * — 1, with m odd, for some integer i G [1, [- 2 
non-magic. In the same paper, the magic number problem is also studied 
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for other subregular language classes. 


3.1 State Complexity versus Quotient Complexity 

Quotient complexity, introduced in 2009 by Brzozowski [HES], coincides, 
for regular languages, with the notion of state complexity but it is dehned in 
terms of languages and their (left) quotients. The left quotient of a language 
L by a word w is dehned as the language w~^L = {x G S* | wx G L}. 
The quotient complexity of L, denoted by k{L), is the number of distinct 
languages that are left quotients of L by some word. It is well known that, 
for a regular language L, the number of left quotients is hnite and is exactly 
the number of states of the minimal DFA accepting L. So, in the case 
of regular languages, state complexity and quotient complexity coincide. 
Considering that quotient complexity is given in terms of languages, and 
their left quotients, some language algebraic properties can be used in order 
to obtain upper bounds for the complexity of operations over languages. 
Actually, the proof that the set of (left) quotients of a regular language 
is hnite m was one of the earliest studies of state complexity. Quotient 
complexity can also be useful to show that an upper bound is tight. If 
a given operation can be expressed as a function of other operations (for 
example, Li — L 2 = LiCi L 2 ), then, witnesses for the worst-case complexity 
of those operations can be used to provide a witness for the complexity of 
the hrst operation. 

4 State Complexity of Individual Operations 

The state complexity of an operation (or operational state complexity) on 
regular languages is the worst-case state complexity of a language resulting 
from the operation, considered as a function of the state complexities of 
the operands. Adapting a formulation from Holzer and Kutrib [93], given a 
binary operation o, the o-language operation state complexity problem can 
be stated as follows: 

• Given an m-state DFA Ai and an n-state DFA A 2 . 

• How many states are sufhcient and necessary, in the worst case, to 
accept the language L{Ai) o L{A 2 ) by a DFA? 

This formulation can be generalized for operations with other arities, 
other kinds of automata and classes of languages. An upper bound can 
be obtained by providing an algorithm that, given DFAs for the operands, 
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constructs a DFA that accepts the resulting language. The number of states 
of the resulting DFA is an upper bound for the state complexity of the 
referred operation. To show that an upper bound is tight, for each operand 
a family of languages (one language, for each possible value of the state 
complexity) must be given such that the resulting automata achieve that 
bound. We can call those families witnesses. The same approach is used 
to obtain the nondeterministic state complexity of an operation on regular 
languages. No proofs are here presented for the stated results, although 
several examples of families of languages, for which the operations achieve 
a certain upper bound, are given. There are very few results of the study 
of state complexity on the average case. However, whenever some results 
are known they are mentioned together with the corresponding worst-case 
analysis. 

In this section, the following notation is used. When considering unary 
operations, let L be regular language with sc{L) = m {nsc{L) = m) and 
let A = (Q,T,,6,qo,F) be the complete minimal DFA (a minimal NFA) 
such that L = L{A). Furthermore, |S| = fc or |S| = /(m) if a growing 
alphabet is taken into account, |F| = /, and \F — {(?o}| = I- In the same 
way, for binary operations let Li and L 2 be regular languages over the same 
alphabet with sc{L) = m {nsc{L) = m) and sc{L 2 ) = n (nsc(L 2 ) = n), and 
let Ai = {Qi,'E,6i,qi,Fi) be complete minimal DFAs (minimal NFAs) such 
that Li = L{Ai), for i G [1,2]. Furthermore, |S| = A: or |S| = f(m,n) if a 
growing alphabet is taken into account, |F)| = ft, and \Fi — {qi}\ = /*, for 
i G [1,2]. 

4.1 Basic Operations 

In this section we review the main results related with state complexity (and 
nondeterministic state complexity) of some basic operations on regular lan¬ 
guages; Boolean operations (mainly union, intersection, and complement), 
catenation, star (and plus), and reversal. For some classes of languages, left 
and right quotients are also considered. Because their particular character¬ 
istics, that were already pointed out in Section for each operation the 
languages are divided into regular {k > 2 and infinite), finite {k >2), unary 
(infinite) and finite unary. Some other subregular languages are considered 
in Section |4.3[ Whenever known, results on the range of complexities that 
can be reached for each operation are also presented. This extension of the 
notion of magic number to operational state complexity is now an active 
topic of research. 

There are some other survey papers that partially review the results here 
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presented and that were a reference to our presentation |178[ 1179111801 [Ml 
fUmiTMl IMlITlllMlIM]. 


4.1.1 General Regular Languages 

Table summarizes the results for general regular languages. The (fifth) 
third column contains the smallest alphabet size of the witness languages 
for the (nondeterministic) state complexity given in the (fourth) second 
column, respectively. Columns with this kind of information also appear in 
several tables to follow. 


Regular 


sc S 

nsc S 

Li U L 2 

mn 

2 

m + n + 1 

2 

Li n L 2 

mn 

2 

mn 

2 

L 

m 

1 

2 m 

2 

1 

to 

mn 

2 



(Li © L 2 ) 

mn 

2 



L 1 L 2 

m2"’ — if n > 1 

2 

m + n 

2 

m, if n = 1 

1 

L* 

2"^-! + 2’"-'-!, if m > 1, f > 0 

2 

m + 1 

2 

m, if m > 1, / = 0 

1 

m + 1, if m = 1 

1 

L+ 

27n—1 _j_ 

2 

m 

2 


2 ™ 

2 

m + 1 

2 

L2\Li 

2 ™ - 1 

2 



L 1 IL 2 

m 

1 



w~^L 

m 

1 

0{m + 1) 


Lw~^ 

m 

1 

m 

1 


Table 1: State complexity and nondeterministic state complexity for basic opera¬ 
tions on regular languages 
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In 1994, Yu et al. |184| studied the state complexity of catenation, star, 
reversal, union, intersection, and left and right quotients. They also studied 
the state complexity of some operations for unary languages. More than 
two decades before, in 1970, Maslov |136j had presented some estimates for 
union, catenation, and star. Although Maslov considered possible incom¬ 
plete DFAs, and the paper has some incorrections, the binary languages 
presented are tight witnesses for the upper bounds for that three opera¬ 
tions M- Rabin and Scott m\ indicated the upper bound mn for the 
intersection (that also applies to union). Maslov and Yu et al. gave simi¬ 
lar witnesses of tightness, both for union and intersection. The families of 
languages given by Yu et al. for intersection are {x G {a,b} \ #aix) = 0 
(mod m)} and {x G {a, 6} | #b{x) = 0 (mod re)}. Their complements are 
witnesses for union. Hricko et al. m showed that for any integers m > 2, 
re > 2, and a G [l,rrere], there exist binary languages Li and L 2 such that 
sc(Li) = m, sc{L 2 ) = re, and sc{Li U L 2 ) = a. Thus, there are no magic 
numbers for the union. The same holds for intersection. 

Complementation for DFAs is trivial (one has only to exchange the fi¬ 
nal states) and thus, the state complexity of the complement is the same 
one of the original language, i.e., sc(L) = sc{L). For other Boolean oper¬ 
ations (set difference, symmetric difference, exclusive disjunction, etc.) the 
state complexity can be obtained by expressing them as a function of union, 
intersection and complement HU. 

For catenation, Yu et al. gave the upper bounds m2'^ — /i2" if rre > 
l,re > 2; and m, if rre > 1, re = 1. They presented binary languages tight 
bound witnesses for m > 1, re = 1 and m = 1, re > 2, but ternary languages 
tight bound witnesses for m > 1, re > 2. However, the bound is tight for 
the following binary language families presented by Maslov: {tc G {a, b}* \ 
#aiw) = (rre — 1) (mod m)} and L{{a*b)^~^{a + b){b + a{a -|- 6))*), for 
all rre, re > 2 and /i = 1. Other families of binary languages for which 
the catenation achieves the upper bound were presented by Jiraskova mi- 
Concerning the possible existence of magic numbers, the same author |ir)8( 
mni proved that, for all rre, re and a such that either re = 1 and a G [1, m], 
or re > 2 and a G [l,rre2"' — 2”'“^], there exist languages Li and L 2 with 
sc(Li) = rre and sc{L 2 ) = re, defined over a growing alphabet, such that 
sc{LiL 2 ) = a. Moreover, Jirasek et al. [103| showed that the upper bound 
rre2"' — /i2"'“^ on the catenation of two languages Li and L 2 , with sc(Li) = 
rre > 2 and sc{L 2 ) = re > 2 respectively, are tight for any integer /i with 
/i G [1, rre — 1]. The witness language families are binary and accepted by 
the DFAs presented in Figure 

The state complexity for the star on a regular language L was studied 
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Figure 2: Witness DFAs for all range of state complexities of the catenation 


by Yu et al. A lower bound of 2™“^ was presented before, by Ravikumar 
and Ibarra |161[ 1160] . If sc{L) = 1 then either L = S*, and sc{L*) = 1, or 
L = 0, and sc{L*) = 2. If sc{L) = m > 1, but I = 0, i.e., the minimal DFA 
accepting L has the initial state as the only final state, then sc{L*) = m, as 
L = L*. Finally, if sc{L) = m > 1, and I > 0, then sc{L*) < 2™“^ 

The upper bound 2”^“^ + 2"^“^ is achieved for the language {rc G {a, 6}* | 
i^aiw) is odd}, if m = 2; if m > 2, for the family of binary languages 
accepted by the DFAs presented in Figure [^(ii). We note that although 
the upper bound given by Maslov is not correct (12™ — 1 instead of |2™), 
the family of languages he presented are witnesses for the above bound (for 
m > 2). Those languages are accepted by the DFAs presented in Figure]^ 



Figure 3: Maslov’s witness DFAs for the state complexity of the star 

Jiraskova m\ proved that for all integers m and a with either m = 1 
and a G [1, 2], or m > 2 and a G [1, 2™“^ + 2™“^], there exists a language L 
over an alphabet of size 2™ such that sc{L) = m and sc{L*) = a. This result 
was improved by Jiraskova et al. |120j by using an alphabet of size atmost 
2m. Again, no gaps or magic numbers exist for the Kleene star operation. 

The state complexity for the plus on a regular language L (L'*' = LL*) 
coincides with the one for star in the first two cases, but for m > 1 and I > 0 
one state is saved (as a new initial state is not needed). 

In 1966, Mirkin |142j pointed out that the reversal of the NFAs given by 
Lupanov as an example of a tight bound for determination (see Figure[^(ii)), 
were deterministic. This yields that 2™ is a tight upper bound for the 
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state complexity of reversal of a (at least ternary) language L such that 
sc{L) = m. Leiss |13U) studied also this problem and proved the tightness 
of the bound for another family of ternary languages. Yu et al. presented 
also (independently) Lupanov example. Salomaa et al. |165j studied several 
classes of languages where the upper bound is achieved. Nevertheless, a 
family of binary languages therein presented as meeting the upper bound 
for m > 5 was later shown not to be so m- A family of binary languages 
for which the upper bound for reversal is tight was given by Jiraskova and 
Sebej |175( I122| and their minimal DFAs are represented in Figure]^ 



a 


Figure 4: Witness DFAs for the state complexity of the reversal 

In the paper cited above [107) . Jiraskova shown that for all m and a with 
2 < m < a < 2m, there exists a binary languague L such that sc{L) = m 
and sc{L^) = a. Allowing alphabets of size 2”^ and m > 3, the reversal 
operation has no magic numbers in the range [logm, 2™]. This result was 
improved by Sebej |172j considering an alphabet of size 2m — 2. Sebej gives 
also some enhanced partial results for the binary case. 

Yu et al. showed that the state complexity for the left quotient of a 
regular language Li by an arbitrary language L 2 , L 2 \ Ai, is less or equal 
to 2™ — 1, with sc(Li) = m, and that this bound is tight for the family 
of binary languages given in Figure [^(ii) and considering L 2 = S*. In 
1971, Conway [H] had already stated that if L 2 is a regular language then 
sc{L 2 \ Li) < 2”^. For the right quotient of a regular language Li by an 
arbitrary language L 2 one has sc{Li/L 2 ) < m. The minimal DFA accepting 
L 1 /A 2 coincides with the one for Li, except that the set of final states is 
the set of states g G Qi such that there exists a word of tc G L 2 such that 
5i{q,w) G Fi. The bound is tight for L 2 = {e}. For the left and the right 
quotients of a regular language A by a word re G S* it is then easy to see 
that sc{w~^L) = sc{Lw~^) < m. As a family of languages for which the 
upper bound is tight consider (a™)* and w G {a}* [M] . 

The state complexity of basic operations on NFAs was first studied by 
Holzer and Kutrib [90] . and also by Ellul [STj- We note that for state 
complexity purposes it is tantamount to consider NFAs with or without 
e-transitions. NFAs are considered with only one initial state and trimmed, 
i.e., all states are accessible from the initial state and from all states a final 
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state is reached. 

For union, only a new initial state with e transitions for each of the 
operands initial states is needed, thus sc(Li UL 2 )<r?T, + n + l. To see that 
the upper bound is tight, consider the families (a™)* and (6"')* over a binary 
alphabet. For intersection, a product construction is needed. 

The nondeterministic state complexity of the complementation is, triv¬ 
ially, at most 2™. That this upper bound is tight even for binary lan¬ 
guages was proved by Jiraskova |106j . using a fooling-set lower-bound tech¬ 
nique [71E31E8]. Those languages are accepted by the NFAs presented in 
Figure]^ (for m > 2). 



a 


Figure 5: Witness NFAs for the nondeterministic state complexity of complemen¬ 
tation 

See Holzer and Kutrib |93j for other witness languages. Using the same 
techniques, Jiraskova and Szabari |ir)3j proved that for all integers m > 1 
and a G [logm, 2”^], there exists a language L over an alphabet of exponen¬ 
tial growing size, such that nsc{L) = m and nsc{L) = a. This result was 
improved to a five-symbol alphabet by Jiraskova [107] . 

Mera and Pighizzini m proved a related best case result, i.e., for every 
m > 2 there exists a language L such that nsc{L) = m, nsc{L) < m -\- 1 
and sc{L) = sc{L) = 2™. However, as we will see below, this result does 
not hold if unary languages are considered. 

The upper bound for the nondeterministic state complexity of catenation 
is m -|- n and this bound can be reached considering the witness binary 
languages given for union. All the values a G [1, m -|- n] can be obtained as 
nondeterministic state complexity of catenation of unary languages |in8j . 

For the plus of a regular language L, we have nsc{L'^) < nsc{L) = m: 
an NFA accepting L'^ coincides with one accepting L except that each final 
state has also the transitions to the initial state. In the case of the star, 
one more state can be needed (if L does not accept the empty word), i.e., 
sc{L*) < m 1. Witness languages of the tightness of these bounds are 
{w G {a,b}* I #a{w) = (m —1) (mod m)}. All range of values a G 
can be reached for the nondeterministic state complexity of the star of binary 
languages [Mj. 

For the reversal, at most one more state will be needed, so nsc{L^) < 
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m + 1. Witness ternary languages were presented by Holzer and Kutrib, 
but the bound is tight even for the family of binary languages {m > 1) 
which minimal NFAs are presented in Figure [6 |in6) . If nsc{L) = m > 3 
the possible values for nsc{L^) are m — 1, morm + 1 snzj. The first value 
is reached by the reversals of the above binary languages and the second 
considering the languages {w G {a,b}* \ |r(;| = 0 (mod m)}. 



Figure 6: Witness NFAs for the nondeterministic state complexity of reversal 


The nondeterministic state complexity of left and right quotients by a 
word were studied by Ellul [54j . Given a minimal NFA A = (Q, S, 6, qo, F) 
accepting L, an NFA C accepting Lw~^, for u) G S, coincides with A 
except that the set of final states is {g G Q | 5{q,w) (1 F ^ 0}. Thus 
nsc{Lw~^) < nsc{L). The witness languages used for the state complex¬ 
ity of right quotient show that the bound is tight. An upper bound for 
nsc{w~^L) can be obtained by considering an NFA C with one new initial 
state and e-transitions from q'^ to each state of A reached when inputing 
w. 

Universal Witnesses Brzozowski [HI Eg identified a ternary family of 
languages Um{a,b,c) which provides witnesses for the state complexity of 
all operations considered in the previous section. The family, presented in 
Figure fulfills also other conditions that, according to the same author, 
should be verified by the most difficult (regular) languages. For a language 
Lm the suggested conditions are: 

(1) The state complexity should be m. 

(2) The state complexity of each quotient of Lm should be m. 

(3) The number of atoms of should be 2™. An atom of a regular language 
with quotients Kq, ..., Km-i is a non-empty intersection of the form 
Kq n • • • n Km-i) where Ki is either Ki or Ki. Thus the number of 
atoms is bounded from above by 2™, and it was proved by Brzozowski 
et al. [32l |3l] that this bound is tigh10 Every quotient of Lm is a union 

^We also notice that the number of atoms of a language L is equal to the state com¬ 
plexity oi L^. 
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of atoms. 


(4) The state complexity of each atom of should be maximal. It was 
shown |33j that the complexity of the atoms with 0 or m complemented 
quotients is bounded from above by 2”^ — 1, and the complexity of any 
atom with r complemented quotients, where l<r<m—1, by 


r m—r-\-k 

f{m,r) = l + Y^ 

k=l h=k^l 


(5) The syntactic semigroup of Lm should have cardinality m™, which is 
well known to be a tight upper bound |136j . This measure, which is 
called the syntactic complexity of a language, has been recently studied 
for many classes of subregular languages [89l 11281 [28l [35l [26l [271 [31] . 

The following result [IIllIEl can be considered a milestone in the operational 
state complexity for regular languages, where Um is depicted in Figure [7| 

{Um{ci,b,c) I m > 3) meets conditions HI and is a witness 
for the reversal and the star. The families {Um{a,b, c) | m > 3) 
and {Un{b,a,c) | n > 3) are witnesses for the Boolean opera¬ 
tions, whereas {Um{a,b,c) | m > 3) and {Un{a,b,c) | n > 3) are 
witnesses for catenation. 


Variants of the universal witness were also given for several combined 
operations. The question of whether there are universal witnesses for other 
operations, classes of subregular languages or other complexity measures is 
an open problem (see [20|). However, when searching for witnesses for a 
given upper bound, to ensure that the above conditions (or some of them) 
are verified, can be a good starting point. Moreover, the study of properties 
that may enforce (some of) the conditions Q - ([^ is fundamental for a 
better understanding of the operational state complexity [T^ . 


4.1.2 Unary Regular Languages 

Table presents the main state complexity results of the basic operations 
on unary languages. Given the constraints on both DFAs and NFAs over 
a one symbol alphabet, and the results presented in Section the state 
complexity for several operations on unary languages is much lower than 
what is predicted by the general results of state complexity. Some results 


15 






Unary Regular 


sc 

nsc 

asc 

Li U L 2 

~ mn 

m-|-n-|-l, Ifm/n 

3C(3) 

2^2 

Li n L 2 

~ mn 

mn, if (m, n) = 1 

3CI§^ 

~ mn 

_ 2ni± _ 

L 

m 

g0(\/»u logm) 


L 1 L 2 

~ mn 

[m + n — 1, m -b n], 

if m, n > 1 

0(1), n < P{m) 

L* 

{m - 1)^ -b 1, 
if m > 1, / > 1 

m -b 1, if m > 2 

0(1) 

L+ 

(m — 1)^ 

m, if m > 2 



m 

m 


w~^L 

m 

m 


Lw~^ 

m 

m 



Table 2: State complexity (sc), nondeterministic state complexity (nsc) and aver¬ 
age state complexity (asc) of basic operations on unary languages. The ~ symbol 
means that the complexities are asymptotically equal to the given values. The up¬ 
per bounds of state complexity for union, intersection and catenation are exact if 
the greatest common divisor of m and n, (m, n) is 1. For the average state complex¬ 
ity of intersection and union, ((n) is the function C of Riemman. For the average 
state complexity of catenation, n must be bounded by a polynomial P in m. 


16 





Figure 7: Universal witness DFAs, Um{a,b,c). 


on the average-case state complexity of operations on unary languages were 
presented by Nicaud |147l I148j . 

A DFA that accepts a unary language is characterized by a noncyclic 
part (the tail) and a cyclic part (the loop). A characterization and the 
enumeration of minimal unary DFAs was given by Nicaud |147| . 

The state complexity of the reversal of a unary language L is trivially 
equal to the state complexity of L. The state complexities of Boolean oper¬ 
ations on unary languages coincide asymptotically with the ones on general 
regular languages. Yu |179j shown that the bound was tight for union (and 
thus, for intersection) if m and n are coprimes and the witness languages are 
(a™)* and (a"')*. The state complexity of catenation and star was proved 
by Yu et al. and the tightness for the first was also shown for m and n 
coprimes. The witnesses for the catenation are (a™')*a™'“^ and (a”)*a"'“^. 
For the star, if m = 2 a witness is (aa)*, and for each m > 2 a witness 
is (a™')*a'”“^. The state complexity when m and n are not necessarily co¬ 
primes was studied by Pighizzini and Shallit |153| 1154) . In this case, the 
tight bounds are given by the number of states in the tail and in the loop 
of the resulting automata. The state complexity for left and right quotient 
by a word on unary languages coincide with the general case. 

Nicaud |147l 1148) proved that the state complexity of union, intersection 
and catenation on two languages Li and L 2 is asymptotically equivalent 
to mn, where m = sc(Li) and n = sc{L 2 ). Let Dn be the set of unary 
(complete and initially connected) DFAs with n states. The average state 
complexity (asc) of a binary operation o on regular languages is given by 

sc{L{A^)oL{A2)) 

. 4 1 X .A2 e Djn X Dn 


This definition can be generalized to operations with other arities, other 
kinds of automata and classes of languages. 
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As shown in Table the average state complexities of catenation and 
star on unary languages are bounded by a constant, and for intersection 
(and union) note that « 0.1826907423. Magical numbers for the star 
operation on unary languages was studied by Cevorova [JOj. Considering 
the gap between the worst-case upper bound, it? — 2n + 2, and the average 
case (less than a constant), it is not a surprise that for every n no more than 
4 complexities are attainable between — An+ Q and the upper bound. In 
the same paper, the author also establishes a relation between this problem 
and the Frobenius problem. 

The nondeterministic state complexity of basic operations on unary lan¬ 
guages was studied by Holzer and Kutrib m, and also by Ellul |54j . For 
union and intersection, the upper bound coincides with the general case. 
However, it was proved to be achievable for union if m is not a divisor or 
multiple of n. As in the deterministic case, the witnesses for intersection 
are (a”^)* and (a”)*, if m and n are coprimes. The nondeterministic state 
complexity of the complementation is 0{F{m)) (where F is the Landau’s 
function of equation ([^), which is directly related with the state complexity 
of determination. Holzer and Kutrib m proved that this upper bound is 
tight in order of magnitude, i.e., for any integer m > 1 there exists a unary 
language L such that nsc{L) = m and nsc{L) = VL{F{m)). Moreover, Mera 
and Pighizzini |139j have shown that for each m > 1 and unary language L, 
such that nsc{L) = m and sc{L) = sc{L) = g,0{y/miogm) ^ then nsc{L) > m. 
The upper bound m + n for the catenation of two unary languages is not 
know to be tight. The known lower bound is m-|-n — 1 achieved by the cate¬ 
nation of {o* I I = (m — 1) (mod m)} and {a* | / = (n — 1) (mod n)} j91| . 
The same languages can be used to show the tightness of the bound m -|- 1 
for the star (and the plus) operation. For the left and right quotients, notice 
that in the unary case w~^L = Lw~^, and the results for the general case 
apply. 

4.1.3 Finite Languages 

Finite languages are an important subset of regular languages. They are 
accepted by complete DFAs that are acyclic apart from a loop on the sink 
(or dead) state, for all alphabetic symbols. Minimal DFAs have also special 
graph properties that lead to a linear time minimization algorithm |162) . 
and where the length of the longest word accepted by the language plays an 
important role. Table shows that the (nondeterministic) state complexity 
of operations on finite languages are, in general, lower than in the general 
case. 
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Finite 


sc S 

nsc S 

Li U L 2 

mn — {m -\- n) 


m -|- n — 2 

2 

Li n L 2 

mn — 2>{m -|- u) -|- 12 

f{m,n) 

mn 

2 

L 

m 

1 

m 

0(/jl + logfe ) 

2 

LIL 2 

{m — n + 3)2"“^ — 1, m -|- 1 > n 

2 

m -|- n — 1 

2 

m -|- n — 2, if /i = 1 

1 

L* 

2m-3 ^ / > 2, m > 4 

3 

m — 1, m > 1 

1 

m — 1, if / = 1 

1 

L+ 

m 

1 

m, m > 1 

1 


m 

0(A:i+i°s'') 

2 

m 

2 


Table 3: State complexity and nondeterministic state complexity of basic opera¬ 
tions on finite languages 


Campeanu et al. [36] presented the first formal study of state complexity 
of operations on finite languages. Yu |179j presented upper bounds of 0{mn) 
for the union and the intersection. The tight upper bounds were given by 
Han and Salomaa m using growing size alphabets. The upper bound for 
union and intersection cannot be reached with a fixed alphabet when m 
and n are arbitrarily large. Campeanu et al. gave tight upper bounds for 
catenation, star and reversal. For catenation the bound (m — n + 3)2”“^ — 1 
is tight for binary languages, if m -|- 1 > n > 2. The DFAs of the witness 
languages are presented in Figure]^ 



Figure 8: Witness DFAs for the state complexity of catenation on finite languages 

For star, Campeanu et al. shown that the bound 2™“^ -|- 2"*“^ is tight 
for ternary languages. The tight upper bound for the reversal of a finite 
binary language is 3 • 2^“^ — 1, if m = 2p, and 2^“^ — 1 if m = 2p — 1. 
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Nondeterministic state complexity of basic operations on finite languages 
were studied by Holzer and Kutrib m- Minimal NFAs accepting finite 
languages without the empty word can be assumed to have only a final 
state (with no transitions); and if the empty word is in the language, the 
initial state is also final. Because there are no cycles, for the union of two 
finite languages three states can be avoided: no new initial state is needed, 
and the initial states and the final states can be merged. The upper bound 
m + n — 2 is tight for the languages and 6”“^, for m,n > 2. In the 

case of the intersection, the upper bound coincides with the general case, 
and it is tight for the binary languages {w G {a, 6}* | #a{w) = 0 (mod m)} 
and {w G {a, | = 0 (mod n)}. Considering the upper bound 

of determination for finite languages, the nondeterministic state complexity 
for complement is bounded by 0(A:i+>°s'=). The lower bound ^(fc^iogfc) is 
reached for alphabets S = {ai,...,afc} of size k > 2, and the languages 
where *>0, 0<j<z, yGS \ {ai}, and m > 2. However, 
a tighter lower bound can be achieved by the determination lower bound 

m 

of H(fci+'°s*'). For catenation of finite languages represented by NFAs, one 
state can be saved. Witness languages for the tightness of the bound m+n—1 
can be the ones used for union. Two states are also saved for the star, and 
for plus the nondeterministic state complexity coincides with the one for 
the general case. Witness languages are a™ and respectively. NFAs 

for the reversal are exponentially more succinct then DFAs. In the case of 
finite languages, and like other operations, one state can be spared. Witness 
languages are {a, 

4.1.4 Finite Unary Languages 

Table summarizes the state complexity and nondeterministic state com¬ 
plexity results of basic operations on finite unary languages |36l I179( lOT] . 
State complexity of union, intersection and catenation on finite unary lan¬ 
guages are linear, while they are quadratic for general unary languages. 
In this setting, nondeterminism is only relevant for the star (and plus), as 
unary regular languages are obtained. As already stated, for a finite unary 
language L, one has sc(L) < nsc{L) -|- 1, and sc{L) — 2 is the length of the 
longest word in the language. If a operation preserves finiteness, for state 
complexity only the longest words must be considered. 
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Finite Unary 


sc 

nsc 

Li U L 2 

max{m, n} 

max{m, n} 

Li n L 2 

min{m, n} 

niin{m, n} 

L 

m 

m + 1 

1 

to 

m 


(Li © L 2 ) 

max{m, n} 


L 1 L 2 

m + n — 2 

m + n — 1 

L* 

2, if m = 3 

m — 1, if / = 1 

— 7m + 13, if m > 4, / > 3 

m — 1 

L+ 

m 

m 


m 

m 


Table 4; State complexity and nondeterministic state complexity of basic opera¬ 
tions on finite unary languages 
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4.2 Other Regularity Preserving Operations 

Table presents the results for the state complexity of some regularity pre¬ 
serving operations, that are detailed in the next paragraphs. 

Proportional removals The proportional removals preserving regularity 
were studied by Hartmamis |174j and were full characterized by Seiferas and 
McNaughton |173j . For any binary relation r C N x N and any language 
L C S*, let the language P{r,L) be defined as 

P(r, L) = {x G S* I G S* such that xy € L A r{\x\, |?/|)}. 

A relation r is regularity-preserving if P{r,L) is regular for every regu¬ 
lar language L. Seiferas and McNaughton m gave sufficient and nec¬ 
essary conditions of regularity preservation in this context. For the spe¬ 
cial case where r is the identity, the correspondent language is denoted by 
2 (L). Domaratzki [5l] proved that for a regular language L, sc{^{L)) = 
0{sc{L)F{sc{L))) (where F is the Landau’s function of equation 0 ) and 
this bound is tight for ternary languages. In the case of L be a unary lan¬ 
guage, one gets sc(^(L)) = sc{L). Following Nicaud’s work on average-case 
complexity, mentioned above, Domaratzki showed that the average state 
complexity of the !(•) operation on a m-state unary automaton is asymptot¬ 
ically equivalent to |m-|-c, for some constant c. Domaratzki also studied the 
state complexity of polynomial removals. Let / G Z[x] be a strictly mono¬ 
tonic polynomial such that /(N) C N. Then, the relation rj = {(n,/(n)) | 
ra > 0} preserves regularity, and sc{P{rf,L)) < 0{sc{L)F{sc{L))). 

In 1970, Maslov |136j had already studied the language |(L), i.e., a language 
P{r, L) such that r is dehned by {{m, n) \ mq = pn} with p, g G N. An open 
problem is to obtain the state complexity of P{r, L) where r belongs to 
the broader class of regularity preserving relations studied by Seiferas and 
McNaughton. 

Nondeterministic state complexity of polynomial removals was studied 
by Goc et al. m- The authors showed an O(n^) upper bound and a match¬ 
ing lower bound in the case where the polynomial is a sum of monomials 
and a constant, or when the polynomial has rational roots. 

Power Given a regular language L and i > 2, an upper bound of the state 
complexity of the language L* is given by considering the state complexity 
of catenation. However, a tight upper bound is obtained if this operation 
is studied individually. Domaratzki and Okhotin [52] proved that sc{U) = 
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Regular 


sc S 

nsc S 

Ul) 

logm) 

3 

0{m?‘) 


m 

1 

u 


6 

im 

2 

im — i + 1 

1 



4 

lCS 

2 'W-^+m log m—0{m) 

4 

1, if m = 1 

2 

20{m?) 

2,3 

2m? + 1, if m > 2 

m 

1 

m 

1 

Li LU L 2 

0(2^n _ 

5 

0{mn) 

5 

Lx 0_L L 2 

^2^-1 _ 2n-2^ 

if m > 3, n > 4 

4 

m + n 

2 

Unique Regular Operations 

-Li U L 2 

mn 

2 



0 

to 

0{mT - 


> 2^(0 


0 

to 

rn^m _ gm-l 

2 



L° 

0(3m-i + (/ + 2)3™-^-i 

- 2)) 





Table 5: State complexity and nondeterministic state complexity of some regularity 
preserving operations: proportional removals for the identity relation (i(T)); power 
U where i > 2; cyclic shift L^^-, shuffle Lx LU L2; orthogonal catenation Lx ©_l 
L 2 ; unique operations: for unique star L°, e ^ L; for the nondeterministic state 
complexity of Lx o L 2 , the combined state complexity of Lx and L 2 is 0{h), for 
h>0. 
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for i >2. The bound is tight for a family of languages over a 
six-symbol alphabet. In the case i = 3, sc{L^) = 6^-3 4 m _ (jy^_ 1 ) 2 ™ — m, 
for m > 3, and the tightness is witnessed by a family of languages over a 
four-symbol alphabet. For the square, i.e. if i = 2, the upper bound is the 
one given by the state complexity of catenation, sc(L^) = m2'^ — 2 "*“^ and 
it is met by a language accepted by a m-state DFA with only one final state. 
In the case of multiple I final states, the upper bound is (m — 1)2^ + 
Cevorova et al. [U] proved that those upper bounds are tight in the ternary 
case for every I G [1, m — 2]. The nondeterministic state complexity of L* is 
proved to be im. This bound is shown to be tight over a binary alphabet, 
for m > 2. The power of unary languages was studied by Rampersad [EH]. 
If L is a unary language with sc{L) = m>2, then sc{U) = im — i + 1. For 
the square, Cevorova et al. showed that all the complexities in the range 
[1, 2m — 1] can be attained for m > 5. 

Cyclic Shift The cyclic shift of a language L is defined as = {vu \ 
uv G L}. Maslov [136] gave an upper bound of {m2'^ — for the state 

complexity of cyclic shift and an asymptotic lower bound of (m — 3)"*“^ • 
2 (m- 3 ) ^ considering languages over a growing alphabet (if complete DFAs 
are considered). Jiraskova and Okhotin [117j reviewed and improved Maslov 
results. Using a fixed four-symbol alphabet, they obtained a lower bound of 
(m-l)!-2('"-i)(”^-2), m > 3, which shows that sc{L^^) = 
for alphabets of size greater than 3. For binary and ternary languages, 
they proved that the state complexity is 2 ®^'” ). As this function grows 
faster than the number of DFAs for a given m, there must exist some magic 
numbers for the state complexity of the cyclic shift over languages of a fixed 
alphabet. 

The nondeterministic state complexity of this operation was shown to 
2 

be 2 ™ -|- 1, for m > 2, and the upper bound is tight for binary languages. 
Although the hardness of this operation on the deterministic case, its non¬ 
deterministic state complexity is relatively low. For a unary language L, as 
= L, one gets sc{L^^) = nsc{L^^) = sc{L). 

Shuffle The shuffle operation of two words wi,W 2 G S* is defined by 

Wi\JJW2 = {UlUi . . . UmVm \ 

Ui,Vi G S*, z G [l,m], Wi = Ui -. .Um and W 2 = vi .. .Vm}- 

This operation is extended trivially to languages. If two regular languages 
are regular, their shuffle is also a regular language. Campeanu et al. |39| 
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showed that the state complexity of the shuffle of two regular languages 
Li and L 2 is less or equal to 2™'’^ — 1. They proved that this bound is 
tight for witness languages over a hve symbols alphabet and if minimal 
incomplete DFAs are considered (see Figure]^. Thus, sc{Li LU L 2 ) is at 
least 



Figure 9: Incomplete DFAs for the tight upper bound of state complexity of shuffle. 

Various restrictions and generalizations of the shuffle operation have 
been studied. Mateescu et al. m introduced the shuffle operation of 
two languages Li and L2 on a set of trajectories T C {0,1}*, Li LUt L 2 . 
When Li, L 2 , and T are regular languages Li LLl'p L 2 is a regular lan¬ 
guage. In particular, if T = {0,1}*, then Li LUr L 2 = Ti LU L 2 ; and if 
T = {0}*{1}*, then Li LUy’ L 2 = T 1 T 2 . Domaratzki and Salomaa |53) stud¬ 
ied the state complexity of the shuffle on regular trajectories. In general, 
sc{Li LU-p L 2 ) < If T belongs to special families of reg¬ 

ular languages, tight bounds were also presented. 


Orthogonal Catenation A language L is the orthogonal catenation of Li 
and L 2 , and denoted by L = LiQ±L 2 , if every word w oi L can be obtained in 
just one way as a catenation of a word of Li and a word of ^ 2 - If catenation 
uniqueness is not verified for every word of L, orthogonal catenation of 
Li and L 2 is undefined, otherwise Li and L 2 are orthogonal. Daley et 
al. [l9] studied the state complexity of orthogonal catenation and generalized 
orthogonality to other operations. Although it is a restricted operation, 
its state complexity is only half of the one for the general catenation, i.e., 
— 2"'“^ for m > 3 and n > 4. The tight bound was obtained for 
languages over a four-symbol alphabet. Concerning nondeterministic state 
complexity, one has nsc{Li ©_l L 2 ) = nsc{Li) + nsc{L 2 ), which coincides 
with the one for (general) catenation. Witness languages presented for the 
catenation are orthogonal (see page 13), thus apply to orthogonal catenation. 
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Unique Regular Operations Similar to orthogonality is the concept 
of unique operation introduced by Rampersad et al. dsn]. However, in¬ 
stead of demanding that every pair of words of the operand languages 
lead to a distinct word on the resulting language, the language resulting 
from a unique operation only contains the words that are uniquely obtained 
through the given operation. Rampersad et al. studied several proper¬ 
ties of unique operations and of their poly counterpart (i.e. where each 
resulting word must be obtained in more than one way), such as closure, 
ambiguity, and membership and non-emptiness decision problems. Results 
on state complexity and nondeterministic state complexity were obtained 

O 

for unique union (Li U L 2 ), unique catenation (Li o L 2 ), unique square 

(L o L = L°^), and unique star {L°). The state complexity of Li U L 2 
is mn, and witness binary languages are {x G {a,b} \ i^a{x) = (m — 1) 
(mod m)} and {x G {a,b} \ #b{x) = (n — 1) (mod n)}, for m,n > 3 
(that were also used by Maslov |136j for general union). For unique cate¬ 
nation, sc{Li o L 2 ) < m3” — /i3”“^ which is much higher than the one 
for general catenation. It is an open problem to know if this bound is 
tight, although several examples, for specific values of m and n, were pre¬ 
sented. However, for the unique square sc{L°‘^) = m3™' — 3™“^, and the 
bound is tight for binary languages and m > 3. For the nondeterminis¬ 
tic state complexity of unique catenation, a exponential lower bound was 
provided. An upper bound for the state complexity of the unique star is 
3™-i + (/ + 2)3™“'^“^ — (2™“^ -b — 2). But, again, it is an open 

problem to know if this upper bound is tight. 

4.3 Other Subregular Languages 

Besides finite and unary languages, several other subregular languages are 
used in many applications and are now theoretically well studied. Prefix- 
free or suffix-free languages are examples of codes that are fundamental in 
coding theory msiE]. Prefix-closed, factor-closed, or subword-closed lan¬ 
guages were studied by several authors [7S1 [mi Ea [72]. These languages 
belong to a boarder set of languages, the convex languages, for which a gen¬ 
eral framework have been recently addressed by Ang and Brzozowski [2] and 
Brzozowski et al. [30] . A detailed survey on complexity topics was presented 
by Brzozowski m- Partially based on that survey, here we summarize some 
of the results concerning the state complexity of preserving regularity op¬ 
erations over some of the convex subregular languages. Star-free languages 
are other family of subregular languages well studied [ITOl 1138] . We briefly 
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address recent results on the (nondeterministic) state complexity of basic 
regular operations on these languages. 

4.3.1 Convex Subregular Languages 

We begin by some definitions and results on determination for these lan¬ 
guages. Let < be a partial order on S*, and let > be its converse. A 
language L is ^-convex \iu<v and v<w with u,w £ L implies v £ L. It is 
<-free if u < rc and w £ L implies v ^ L. It is ^-closed if u < rc and w £ L 
implies v £ L. It is ^-closed if u > rc and w £ L implies v £ L. The closure 
and the converse closure operations are; 

= {u I u < tc for some w £ L}, 

L< = {u I rc < u for some w £ L}. 

The freeness operation, L- can defined for a language L, by 

L- <£ L and \/w £ L-,\/v £ S*, v Clw implies v ^ L-. 

The following proposition is from [2], except for the last item. 

Propositiou 1 Let < be an arbitrary relation on S*. Then 

1. A language is <-convex if and only if it is >-convex. 

2. A language is <-free if and only if it is >-free. 

3. Every <-closed language and every >-closed language is <-convex. 

4- A language is <-closed if and only if its complement is >-closed. 

5. A language L is <-closed (>-closed) if and only if L =< L (L = L^). 

6. A language L is <-free if and only if L = L-. 

We consider < to be: 

• <: if u,v,w £ S* and w = uv, then u is prefix of w, and we write 
u <w. 

• if u,v,w £ S* and w = uv, then v is suffix of w, and we write 
V < w 
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• C: if u,v,w £ S* and w = uxv, then x is factor of w, and we write 
X C u). Note that a prefix or suffix of w is also a factor of w. This 
relation is also called infix. 

• (s: if rc = woaiwi ■ ■ ■ anWn, where oi,..., G S, and wq, ... ,Wn £ 
S*, then u = ai • • • On is a subword of w; and we write v w. Note 
that every factor of re is a subword of w. 

If a language is both prefix- and suffix-convex it is bifix-convex. In 
the same way are defined bifix-free and bifix-closed languages. Ideals are 
languages directly related with closed languages. A non-empty language 
L C S* is a 

• right ideal if L = LS* (also called ultimate definite |151j i: the com¬ 
plement is prefix converse-closed. 

• left ideal if L = S*L (also called reverse ultimate definite |151j i: the 
complement is suffix converse-closed. 

• two-sided ideal if L = (also called eentral definite)', the com¬ 

plement is bifix converse-closed. 

• all-sided ideal if L = S* LU L; the complement is subword converse- 
closed; also studied by Haines m and Thierrin |176j . 

Some of the languages defined above are also characterized in terms of 
properties of the finite automata that accept them. In particular: prefix- 
closed languages are accepted by NFAs where all states are final; suffix-closed 
languages are accepted by NFAs where all states are initial; factor-closed 
languages are accepted by NFAs where all states are initial and final; prefix- 
free languages are accepted by non-exiting NFAs (i.e. there are no transitions 
from the final states); suffix-free languages are accepted by non-returning 
NFAs (i.e. there are no transitions to the initial state); and factor-free 
languages are accepted by non-returning and non-exiting NFAs. 

The state complexity of the determination on some subregular languages 
(or for the kind of NFAs they are defined by) was recently studied by Bor- 
dihn et al. |9], Jui-Yi Kao et al. [Ml, and Jiraskova et al. Hm. Tabled 
presents some of the values for the languages considered above. The exis¬ 
tence of magic numbers for some subregular languages was studied by Holzer 
et al. [88| . As can be seen in Table m is the only magic number for all free 
languages and for both prefix- and factor-closed languages (except if m = 1, 
where m is non-magic). SufRx-closed languages have no magic numbers. 
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Free 

VI 

^ |S| 

E |S| 

2m-l ^ ^ 

3 

2m-l ^ ^ 

3 

2m-2 2 

3 

]m,2™-i 1] 


]m,2™-2 -P2] 

Closed 

VI 

^ |E| 

E |S| 

2™ 

3 

2^-1 1 

4 

2m-l 

4 

]m, 2™] 

[m, 2"^-! -h 1] 

]m,2™-i -h 1] 

Ideal 

right S 

left |S| 

two-sided S 

2^-1 

2 

2m-l 1 

3 

2™-2 -p 1 

3 


Table 6: State complexity of determination of free, closed and ideal languages 
considering prefix, suffix and factor partial orders, respectively. For each free and 
closed of languages, the range of correspondent non-magic numbers appears on the 
second row. 

Free languages Table [^summarizes state complexity results of individual 
operations on prefix-free languages [HU ESI 11121 \TI[ 112011116L11021 ES] • In 
the case of state complexity, the results are valid for Boolean operations if 
m,n > 3; for catenation if m, n > 2; for star if A: = 1, then m > 3, if A: = 2 
then m 7 ^ 3, and else m > 2; and for reversal if m > 4 and the tight bound 
cannot be reached if A: = 2 m- The state complexty of right quotient is 1, 
if A: = 1 and m = 1 or m > n, and if A: = 2 and m = 1 or n = 1; furthermore, 
if m = 2 then sc{Li/L 2 ) = n |102j . 

Note that here the state complexity of the catenation and the star are 
much lower than on general regular languages. Moreover, for the star, the 
only complexities attained are m — 2, m — 1, and m m- 

Table summarizes the state complexity of some regular operations on 
suffix-free languages. Han and Salomaa showed that all bounds, except for 
complementation, difference, and symmetric difference, are tight [SOI 181) . 
Jiraskova and Olejar [119) provided binary witnesses for intersection and 
union. They also proved that for all integer a between 1 and the respective 
bound there are languages Li and L 2 such that (n)sc(Li o L 2 ) = a, for 
o G {n,U} (and witnesses ternary, except for nsc{Li n L 2 ) for which the 
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Prefix-free 


sc 

|S| 

Li U L 2 

mn — 2 

2 

Li n L 2 

mn — 2(m + n — 3) 

2 

L 

m 

1 

1 

to 

mn — m — 2n + 4 

3 

(Li © L 2 ) 

mn — 2 

2 

L 1 L 2 

m + n — 2 

1 


n — 1 

2 

L 1 IL 2 

n — m + 2 

1 


m 

2 

L* 

m — 2 

1 


2*71-2 1 

3 

LCS 

(2m - 3)"^-2 

6 


Table 7; State complexity and 
tions on prefix-free languages 




nsc S 


m + n 

2 


mn — (m + n) + 2 

2 


2*77-1 

3 


(m - l)2”-i + 1 

4 





m + n — 1 

1 





m 

1 


m 

1 


2m?‘ — Am + 3 

2 


state complexity of some opera- 




Suffix-free 


sc 

|S| 

nsc 

|S| 

Li U L 2 

mn — {m + n — 2) 

2 

m + n — 1 

2 

Li n L 2 

mn — 2(m + n — 3) 

2 

mn — (m + n — 2) 

2 




2177-1 

3 

L 

m 

1 

< + 2™-3 1 

2 




Q{y/m) 

1 

Li — L 2 

mn — (m + 2n — 4) 

4 



Li © L 2 

mn — {m + n — 2) 

5 



L 1 L 2 

(m - 1)2^-2 + 1 

4 



L* 

2*71-2 

4 




2171-2 ^ 

3 




Table 8: State complexity and nondeterministic state complexity of some opera¬ 
tions on suffix-free languages 


witnesses are over a four-symbol alphabet). The bounds for difference and 
symmetric difference are from Brzozowski et al. [22]. Jiraskova et al. m 
proved the results for complementation. 

If a language is subword-free then it is factor-free, and if it is factor- 
free then it is bifix-free. Table summarizes the state complexity of some 
regular operations on bifix-, factor-, and subword-free languages [22]. The 
tight upper bounds for the state complexity of these operations on the three 
classes of languages coincide. 


Closed Languages and Ideals Table 10 shows the state complexity of 
some basic operations on prefix-, suffix-, factor-, and subword-closed lan¬ 
guages. A language is factor-closed if and only if it is subword-closed. So 
the state-complexity results of operations are the same for those classes. The 
state complexity of the closure on the respective partial orders is also consid¬ 
ered. Subword and converse subword closures were first studied by Gruber 
et al. [761177] and Okhotin Brzozowski et al. [23l |2l] presented the 

tight upper bound, but using a growing alphabet. Karandikar and Schoebe- 
len |127| shown that the exponential blown up is also required in the binary 
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Free 



< u © 

C 

(S 


sc 


S| 

Li U L 2 

mn — m — n 

5 

5 

< m + n — 3 

Li n L 2 

mn—3m—3n+12, m,n> A 

3 

3 

m + n — 7 

Li — L 2 

mn — 2m — 3n + 9 

4 

4 

< m + n — 6 

Li © L 2 

mn — m — n 

5 

5 

m + n — 3 

L 1 L 2 

m + n — 2, m,n>l 

1 

1 

1 

L* 

m — 1, m > 2 

2 

2 

2 


2™-3 + 2, m > 3 

2 

2 

2m-3 _ 


Table 9; State complexity of basic operations on bifix-, factor-, and subword-free 
languages 


Closed 


VI 

Yl 

E,(£ |S|c |S|g 

Li U L 2 

mn 

2 

mn 

4 

mn 

2 

2 

Li n L 2 

mn—m—n-\-2 

2 

mn 

2 

mn—m—n+2 

2 

2 

Li — L 2 

mn — n + 1 

2 

ran 

4 

mn — n + 1 

2 

2 

Li © L 2 

mn 

2 

mn 

2 

mn 

2 

2 

L 1 L 2 

m2"-2-|-2"-2 

3 

mn—fn-\-f 

3 

m + n — 1 

2 

2 

L* 

2™-2 1 

3 

m 

2 

2 

2 

2 


2m-l 

2 

2™-i + 1 

3 

2m-2 2 

3 

2m 

© 

VI 

m 

1 

2 m -1 

2 

2™ - 1 

2 







2^-2 2 

2^(f) 


m — 2 

2 


Table 10; State complexity of some operations on prefix-, sufHx-, factor-, and 
subword-closed languages. The last two columns correspond to factor and subword, 
respectively. The last but one row contains the state complexity of the closure of 
prefix, suffix, and factor respectively. The last row contains the state complexity 
of the subword closure, considering unbounded and binary alphabets, respectively. 
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case. Given a regular language L with sc{L) = m, nsc{(^L) = nsc{L,^) = m 
and these upper bounds are tight for witness binary languages. Prefix, suf¬ 
fix, and factor closures (respectively, <L, ^L, and \_L) were studied by Kao 
et al. |126j . If L does not have 0 as a quotient, Brzozowski et al. shown that 
the state complexity of the suffix closure is 2”^ — 1 (instead of 2™“^). 


Ideal 


right S 

left |S| 

-sided |S|two l^^laii 

Li U L 2 

mn—m—n-\-2 

2 

mn 

4 

mn—m—n+2 

2 

2 

Li n L 2 

mn 

2 

mn 

2 

mn 

2 

2 

Li — L 2 

mn—m+1 

2 

mn 

4 

mn—m+1 

2 

2 

Li © L 2 

mn 

2 

mn 

2 

mn 

2 

2 

L 1 L 2 

m+2”-'^ 

1 

m+n —1 

1 

m+n—1 

1 

3 

L* 

m + 1 

2 

m + 1 

2 

m + 1 

2 

2 

If e G L, then L = S* and sc{L*) = 1. 


2m-l 

2 

2m-l ^ ^ 

3 

2m-2 ^ 

3 

2m —4 


Table 11: State complexity of basic operations on ideals. The last two columns 
correspond to two-sided and all-sided ideals, respectively. 


If L is a right (respectively, left, two-sided, all-sided) ideal, any language 
G C S* such that L = GS* (respectively, L = S*G, L = S*GS*,L = 
S* LU G) is a generator of L. Brzozowski and Jiraskova [21] studied state 


complexity on ideals. Table H presents the state complexity of basic oper¬ 
ations on ideals. As stated before closed languages and ideals are related. 
In particular, the state complexity of basic operations on two-sided and all- 
sided ideals coincide. Brzozowski m observed that for the four types of 
convex languages (prehx, suffix, factor and subword) the state complexity 
of the Boolean operations is mn. 


Unary convex languages In the case of unary languages, prefix, suffix, 
factor, and subword partial orders coincide. Table 12 summarizes the state 
complexity of basic operations on unary free, unary closed, unary ideals and 
unary convex languages. 
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Unary 


Free 

Closed 

Ideal 

Convex 

Li U L 2 

max{m, n} 

max{m, n} 

min{m, n} 

max{m, n} 

Li n L 2 

m = n 

min{m, n} 

max{m, n} 

max{m, n} 

1 

to 

m 

m 

n 

max{m, n} 

Li © L 2 

max{m, n} 

max{m, n} 

max{m, n} 

max{m, n} 

L 1 L 2 

m + n — 2 

m + n — 2 

m + n — 1 

m + n — 1 

L* 

m — 2 

2 

m — 1 

— 7n + 13 


m 

m 

m 

m 


Table 12; State complexity of basic operations on unary convex languages 



Regular 

Ideal 


sc S 

sc S 


m + 1 

2 

m + 1 

2 


{m — 1)2™'“^ + 2, m > 4 

4 

n(n-l) 1 D 

1 

L'= 

(m - 2)2™-3 + 3, m > 4 

3 

n + 1 

1 


(m - 2)2™-2 + 3, m > 4 

4 




Table 13: State complexity of prefix, suffix, factor and bifix operations on regular 
languages and on ideals (right, left and two sided, respectively). 
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Preeness Operations Here we analyse the state complexity of freeness 
operations for prefix, suffix, bifix and factor orders that were studied by 
Pribavkina and Rodaro HSg. Given a regular language L, the <-free lan¬ 
guage L- for < G {<, :<, C}, is respectivel}0 

• prefix: L- = L — LS'*' 

• suffix: L- = L — S'*“L 

• factor: = L - (S+LS* U S*LS+) 

The bihx operation is defined by = L-nL-. If L is an ideal, prefix, suffix 
and factor operations were studied by Brzozowski and Jiraskova [2T]. In this 
case, the resulting languages are minimal generators for left, right and two 
sided ideals, respectively. Table [T^ presents the state complexity of prefix, 
suffix, factor and bihx operations on regular languages (and correspondent 
ideals). The state complexity of this operations is much lower in the case of 
right and two-sided ideals than for general regular languages. 


4.3.2 Star-free Languages 


Star-free languages are the smallest class containing the finite languages and 
closed under Boolean operations and catenation. This class of languages 
correspond exactly to the regular languages of star height 0. The minimal 
DFAs of star-free languages are permutation-free (i.e. no word performs a 
non-trivial permutation of a subset of its states). Bordhin et al. [9] showed 
that the state complexity of the determination of a star-free language L is 
2 nsc(L) ^ Figure 10 presents a family of ternary NFAs for which the bound 
is tight. Holzer et al. [88] showed that star-free languages have no magic 
numbers. 



Figure 10: Minimal m-state NFAs with equivalent minimal 2^-state DFA for 
star-free languages 


Brzozowski and Liu [29| studied the state complexity of the basic reg¬ 
ular operations on star-free languages, and their results are summarized in 

^In |155) the superscripts for prefix, suffix and factor operations were respectively p, s 
and L. 
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Table 14 The bounds obtained for general regular languages are reached 


except in the catenation for n = 2, the reversal, and operations on unary lan¬ 
guages. Holzer et al. |95l [96] studied the same languages for the operational 
nondeterministic state complexity. The bounds coincide with the ones for 
general regular languages and are tight for binary languages. The witness 
languages for union and catenation are and 

For intersection, witnesses are and a*{ba*)'^~^. The first wit¬ 

ness for union is also a witness for the star operation. The language family 
presented in Figure is star-free and thus a witness for the reversal op¬ 
eration. On unary star-free languages, the upper bounds for operational 
nondeterministic state-complexity coincide with general case, except for the 
complementation. Holzer et al. |96| showed that for reversal and star the 
bounds are tight. For union, the presented lower bound misses the upper 
bound by one state. For intersection, the presented bound is tight in the 
order of magnitude (0(mn)) and the bound for complementation is 0(n^). 
The lower bound for catenation misses the upper bound for unary general 
languages by one state. 


Star-free 


sc S 

Unary 

o 

to 

mn 

2 

max{m, n} 

LIL 2 

(m- l)2^-h2”-i, if n > 3 

4 

m -|- n — 1 

[3m — 2, 3m — 1], if n = 2 

3 

L* 

2, if m = 1 

1 

2, if m = 1 

2”"-i -h2™-2, if m > 2 

4 

m, if m G [2, 5] 



m^ — 7m -|- 13, if m > 5 


2 m _ ^ 

m — 1 

m 


Table 14: State complexity of basic regular operations on star-free regular and 
unary languages, where o s {U,n,\,0}. For non-unary star-free languages and 
n = 2, m > 2. For non-unary star-free languages if m € [1,2], the bound for 
reversal is tight for jSj > m, and if m > 3, for |E| > to — 1. 
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4.4 Some More Results 


We briefly cite some more work on operational state complexity. Campeanu 
and Ho 137! and Brzozowski and Konstantinidis [25] considered uniform 
finite languages. Krieger et al. studied decimations of languages 
Campeanu and Konstantinidis [38] analysed a subword closure operation. 
Union-free languages were considered by Jiraskova and Masopust |113tlll4] . 
The same authors studied the state complexity of projected languages ms]. 
The chop (or fusion) of two words is their catenation where the touching 
symbols are merged if equal, or is undefined otherwise. The chop oper¬ 
ation and its iterated variants (star and plus) where studied by Holzer et 
al. |871[85l[86|. The (nondeterministic) state complexity results are similar to 
the ones for catenation, star and plus, with the exception of chop-star where 
the complexities also depend on the alphabet size. This comes as a surprise 
as chop based regular expressions are known to be exponentially more suc¬ 
cinct than classical catenation based ones. Bassino et al. |1| provided upper 
bounds of the state complexity of basic operations on cofinite languages as 
a function of the size the of complementary finite language (taken as the 
summation of the lengths of all its words). The average state complexity 
on finite languages is addressed in two works. Gruber and Holzer m anal¬ 
ysed the average state complexity of DFAs and NFAs based on a uniform 
distribution over finite languages whose longest word is of length at most 
n. Based on the size of finite languages as the summation of the lengths 
of all its words and a correspondent uniform distribution, Bassino et al. |3j 
establish that the average state complexities of the basic regular operations 
are asymptotically linear. 

5 State Complexity of Combined Operations 

The number of standard individual operations on regular languages is clearly 
limited and almost all of their state complexities have been already obtained. 
However, in many practical cases, not only these individual operations but 
also their combinations are used, for example, the operations expressed by 
the regular expressions in the programming language Perl. These combina¬ 
tions are called combined operations. 

In 2011, Salomaa et al. |164| proved that it cannot exist an algorithm 
such that, for a given composition of basic regularity preserving operations, 
computes the state complexity of the corresponding composed operation. 
The undecidability result holds already for arbitrary compositions of in¬ 
tersection and marked concatenation and the proof relies on a reduction 
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from Hilbert’s Tenth Problem. Although the composition of state complexi¬ 
ties of individual component operations of a combined operation would give 
an upper bound for the state complexity of the combined operation, the 
upper bound is usually too high to be meaningful |131L 116311182j . For ex¬ 
ample, for two regular languages Li and L 2 accepted by m-state and an 
n-state DFA, respectively, the exact state complexity of (Li U L 2 )* is actu¬ 
ally 2”^+"’“^ — 2™“^ — 2"'“^ -|-1, while the composition of their individual state 
complexities is -I- 2 ™'"“^. Clearly, 0(2™'+"') and 0(2™") are totally 

different. 

Since the number of combined operations is unlimited and the state 
complexities of many of them are very difficult to compute, it would be 
good if we have a general estimation method that generates close upper 
bounds of the state complexities of combined operations which are good 
enough to use in practice. Such an estimation method has been proposed 
by Esik et al. [56], and Salomaa and Yu [169] . A further concept in this 
direction, approximation of state complexity has been introduced Gao and 
Yu [66] . 

In the following, we will survey both the results of state complexities of 
combined operations and the results of estimations and approximations of 
state complexities of combined operations. 

5.1 State Complexity of Combined Operations on Regular 
Languages 

The state complexities of a number of basic combined operations on regular 
languages have been studied. Most of these combined operations are com¬ 
posed of two basic individual operations. The results are shown in Table [TS] 

In 1996, Birget [8] obtained the the state complexity of S*L, where L 
is a regular language. This combination of complementation, catenation 
and star is the first combined operation composed of different individual 
operations whose state complexity was established. In 2007, Salomaa et 
al. [163] pointed out that the mathematical composition of state complexi¬ 
ties of individual component operations of a combined operation is usually 
much higher than the state complexity of the combined operation. This is 
because the result of a component operation of the combined operation may 
not be among the worst-cases of the succeeding component operation. They 
established the state complexity of (Li U L 2 )* and indicated that the state 
complexity of (Li n L 2 )* should be at least reasonably close to the math¬ 
ematical composition of state complexities of intersection and star. Later, 
Jiraskova and Okhotin |118) proved that the state complexity of (Li H L 2 )* 
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is exactly the same as the mathematical composition of state complexities 
of intersection and star. 

Gao et al. [M], in 2008, established the state complexities of {L 1 L 2 )* 
and {LfY, where Li and L 2 are regular languages. The state complexity of 
(L 1 L 2 )* is — 2 ™“^ — 2 "'“^ + 1 which is lower than the mathematical 

composition of the state complexity of catenation and star. Interestingly, 
the state complexity of {LY)* is the same as that of which is 2™. The 
worst-case example over a three-letter alphabet for |184j also works for 
(Lf)T 

In 2008, Liu et al. HM] studied the state complexities of (Li U ^ 2 )'^, 
(LinL 2 )^, and (LiL 2 )^, where Li and L 2 are regular languages. The tight 
bounds for (Li U ^ 2 )^ was proved and the state complexity of (Li n L 2 )^ is 
the same as that of {Li U ^ 2 )^ because of De Morgan’s laws and = L^. 
They also gave an upper bound for the last combined operation which was 
proved to be tight, in 2012 , by Cui et al. m- 

Cui et al. [16] established the state complexities of Li{L 2 U L 3 ) and 
Li[L 2 n L 3 ) in 2011. The state complexity of Li{L 2 U L 3 ) is lower than the 
mathematical composition of the state complexities of union and catenation, 
whereas the state complexity of Li{L 2 r\LY) is the same as the corresponding 
composition. 

In 2012, Jiraskova and Shallit w proved the state complexity of the 
combined operation L\ to be where Li is a regular language 

accepted by an m-state DFA. A seven-letter alphabet was used in the proof 
for the lower bound. 

Gao et al. presented the state complexities of four combined operations: 
LJUL 2 , L\r\L 2 , L{^UL 2 , and L{^nL 2 , where Li and L 2 are regular languages 
accepted by m and n-state DFAs, respectively. The state complexities of 
the four combined operations are all n — 1 less than the mathematical com¬ 
position of the state complexities of their component operations. Although 
gaps are the same, the reasons causing them are different. For U L 2 and 
n L 2 , the gap n — 1 exists because there are n — 1 unreachable states in 
the constructions of resulting DFAs. For Lf UL 2 and Lf nL 2 , it is because 
n states are equivalent and can be merged into one in the constructions. 

Gui et al. [ISl E] gave the state complexities of a number of combined 
operations including: LJL 2 , L 1 L 2 , L 1 L 2 , L 1 L 2 , (Ti U L 2 )L^, {Li n L 2 )L^, 
L 1 L 2 U L 3 , and L 1 L 2 n L 3 . The state complexities of the first five combined 
operations are less than the corresponding mathematical compositions and 
the state complexities of the others are the same as the compositions. The 
state complexity of L 1 L 2 is equal to that of catenation combined with an- 
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Regular 


sc S 

S*Li 

to 

1 

2 

Lf 

20(mlogm) ([I2T]) 

7 

(Li U L2)* 

2m+n-i _ 2m-i _ 2^-1 + 1 1|1 1 S[ n 6,S| i 

2 

(TiHLa)* 

2™”-i -b 2™’"-2 rinsii 

6 

(T1T2)* 

^m+n-1 2™+^-4 _ 2"*-! - 2"“^ -b m -b 1 ([M!) 

4 

(Lf)* = {LI)R 

2”^ ([Mj) 

3 

(LiULa)^ 

2™+’" - 2"^ - 2"-b 2 llbSlll 

3 

(Li n L2)^ 

2™+’" _ 2"^ _ 2"-b 2 l|i;i1|1 

3 

{LiL 2 )^ 

d • 2"^+”-2 2*^-b 1 ipTindiji 

4 

L\L 2 

5 • 2™+’"-3 - 2”^-! - 2 ” -b 1 (gZ]) 

4 

LiLl 

( 3 m- 1 ) 2"-2 (gS]) 

3 

LfL 2 

3 • 2""+”-2 ([U]) 

4 

LiL« 

^2” _ 2"-i - m-b 1 lg8]l 

3 

Li{L 2 U -L3) 

(m - 1)(2^+P - 2^ - 2P -b 2) -b 2 ^+p- 2 (g6]) 

4 

Li{L2 n -L3) 

m2’"P - 2 ^P-^ (g6]) 

4 

L1UL2 

3 - 2 "*- 2 .n-n-bl (gZ!) 

3 

LJ nLa 

3 - 2 ”^- 2 .n-n-bl (gZ]) 

3 

Lf UL2 

2 ^-n-n + l (gZl) 

4 

Lf nLs 

2™ • re — n -b 1 (gZ]) 

4 

(Li U L 2 )L^ 

mn 2 P — (m + n — 1)2^*“^ (gZ]) 

4 

(Li n L2)L3 

mn 2 P - 2 P-^ lg 7 |l 

4 

L1L2 U L3 

(m2'^ - 2 ^-'^)p (gZ]) 

4 

L1L2 n L3 

(m 2 ^ - 2 ^-i)p (gZ]) 

3 

L 1 L 2 -L 3 

m2^+P - 2 ^+P-i - (m - 1 ) 2 ”+p -2 
-2”+p- 3 - (m - 1)(2P - 1) 1[56]1 

5 


Table 15: State complexities of some basic combined operations on regular lan¬ 
guages 
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timorphic involution {Li9{L2)) in biology |48j . Up to now, the state com¬ 
plexities of all the combined operations composed of two basic individual 
operations have been obtained. These results will serve as the basis of the 
research on the state complexities of combined operations with more com¬ 
plex structures in the future. 

Besides these basic combined operations, a few combined operations on 

k 

k operand regular languages have also been investigated, e.g. (|J Li)*, 

i=l 

The state complexity of 
Uk by Birget [7] , and Yu 


k > 2. These results are summarized in Table 16 
Li n L 2 n ... n Lk, k > 2 was shown to be nin 2 • • 
and Zhuang |183j in 1991, where Li is a regular language accepted by an n^- 
state DFA, 1 < i < k. Esik et al. later extended the result to combined 
Boolean operations. A combined Boolean operation /(Li, L 2 ,..., Tfc) is a 
function which can be constructed from the projection functions and the 
binary union, intersection and the complementation operations by function 
composition, e.g. Li U L 2 H L 2 n ... n L^. Its state complexity was proved 
to be also nin 2 - ■ ■ Uk- Esik et al. [56] presented the state complexities of 
L 1 L 2 L 3 and L 1 L 2 L 3 L 4 in the same paper. The worst-case examples for 
the two combined operations are modifications of the worst-case examples 
proposed by Yu et al. m for catenation. On the basis of these results, 
Gao m established the state complexity of L 1 L 2 ... Lk, which formula is 
too complex to figure here. 

In 2012, Gao et al. |62| gave the state complexities of a series of combined 
operations composed of arbitrarily many individual operations, including: 

(U L,)*, (U L,)2, U L*, n L*, U Ll n LI U Uf, and R L^- Tight 
2 = 1 2=1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2=1 

bounds were established for all these combined operations. 

In Table 16 , we can see that all the results on the state complexities of 
combined operations on k operand languages were proved with increasing 
alphabets. Clearly, it is comparatively easier to design worst-case exam¬ 
ples with increasing alphabets than hxed ones. However, the most crucial 
reason is that it is impossible to design a worst-case example for a com¬ 
bined operation on arbitrary k operand languages which are over a fixed 
alphabet and accepted by arbitrary ni, n 2 , ..., n^-state DFAs, respectively. 
This is because there exist only a limited number of different DFAs with a 
fixed number of states if the alphabet is fixed. Therefore, when k is large 
enough and Uj is an arbitrary positive integer, 1 <i <k, some of the DFAs 
may have the same number of states and some of them may be indeed the 
same according to pigeonhole principle [62|. Thus, the research on the state 
complexities of combined operations on k operand languages uses increasing 
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alphabets in general. 


5.2 State Complexity of Combined Operations on Prefix-free 
Regular Languages 

Since the research history of combined operations is much shorter than that 
of individual operations, there remains a lot of work to be done on state 
complexity of combined operations for subregular language classes. The 
state complexities of several combined operations on prefix-free regular lan¬ 
guages were obtained by Han et al. [HI], in 2010. These results are shown 
in Table flTl 

5.3 Estimation and Approximation of State Complexity of 
Combined Operations 

We can summarize at least two problems concerning the state complexities 
for combined operations. First, the state complexities of combined oper¬ 
ations composed of large numbers of individual operations are extremely 
difficult to compute. Second, a large proportion of results that have been 
obtained are pretty complex and impossible to comprehend [65|. For exam¬ 
ple, Esik et al. |56) shown that the state complexity of the catenation for 
four regular languages with state complexities m,n,p,q, respectively, is 

Clearly, in these situations, close estimations and approximations of state 
complexities are usually good enough to use. 

5.3.1 Estimation of State Complexity of Combined Operations 

An estimation method through nondeterministic state complexity to obtain 
the upper bound was first introduced by Salomaa and Yu |169] . Assume we 
are considering the combination of a language operation gi with k arguments 
together with operations § 2 , i = 1,..., /c. The nondeterministic estimation 
upper bound, or NEU-bound for the deterministic state complexity of the 
combined operation gi{g 2 ., ■ ■ ■ , 52 ) is calculated as follows: 

(i) Let the arguments of the operation g\ be DFAs A®- with m® states, 
i = 1,...,A:, j = l,...,ri, r* > 1. 
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Regular 


sc S 

(u L,r 

i=l 

k 

^ rij-k 

n ( 2 ’"*-^ - 1 ) + 2^-1 ([ 61 ]) 

2=1 

2fc + 1 

iULif 

i=l 

nK-i)[n(2-*-i) + i] 

h=l i=l 

k 

k k Y] rim-k 

+ [ Unj-U {ni - 1)]2-=1 ([M]) 

i=i i=i 

2fc + 1 

k 

UL* 

i=l 

(i)"2"-E[n(|2"^ -l) n (f2-‘)] + l (163|) 

2=1 j = l t=2+l 

2k 

k 

HL* 

2=1 

(f)"2"-E[n(|2”^ -l) n (f2"‘)] + l ([63]) 

2=1 j = l t=2+l 

2k 

II (T 

k 

Ylina^^ -2^^-^) ([62]) 

2=1 

2k 

k 

nrf 

2=1 

k 

n (ni 2 ’"» - 2 "*-i) (|n2]) 

2=1 

2k 

k 

U^f 

2=1 

n(2-*-l) + l ([62]) 

2=1 

3k 

k 

n^f 

2=1 

n(2-*-l) + l ([62]) 

2=1 

3k 

A Boolean 

operation 

f{Lu...,Lk) 

nin2 ■ ■ - rik ([3 15611183] ) 

2k 

L 1 L 2 ■ ■ ■ Lk 

see details in [56] |60l [65] 

2k-1 


Table 16: State complexities of some combined operations on k regular languages, 
k>2 
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Prefix-Free Regular 


sc S 

(Li U L 2 )* 

5 • 2""+”-® ([H]) 

4 

(TiCLs)* 

mn — 2{m + n) + 6 f[84]l 

4 

(T 1 T 2 )* 

m -|- n — 2 ('[8i]l 

2 

{Lfr = {LI)R 

2^-2 + 1 ([83]) 

3 


Table 17: State complexities of some combined operations on prefix-free regular 
languages 


(ii) The nondeterministic state complexity of the combined operation is at 
most the composition of the individual state complexities, and hence 
the language 

9i(9^(L(yll),.... i(A;,)),.... fedilAf)..... L(Ajj)) 

has an NFA with at most 

nsc(5i)(nsc(5r2)(mj,..., , nsc(5rf)(mj,..., 

states, where nsc(g') is the nondeterministic state complexity (as a 
function) of the language operation g. 

(iii) Consequently, the deterministic state complexity of the combined op¬ 
eration gi{g 2 ,..., g^) is upper bounded by 

2 nsc(gi) (nsc(gl) (mj, ■ ■ • ), ■ • • ,nsc(g|) (m *,... )) ^2 \ 


Table 18 shows the state complexities and their corresponding NEU- 
bounds of the four combined operations [169| : (1) star of union, (2) star of 
intersection, (3) star of catenation, and (4) star of reversal. This method 
works well when a combined operation ends with the star operation. How¬ 
ever, it does not work well in general for combined operations that are ended 
with reversal |56lll69j . For example, the state complexity of {L(A)riL(B))* 
is 2™’"*“'^ — 2™ — 2” -|- 2, where A and B are m-state and n-state DFAs, respec¬ 
tively. But using the above method, we would obtain an estimate 2™”+^. 
We note that in this particular case if reversal is distributed over intersection 
we can again recover a good estimate. Thus, it may be possible to have a 
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Regular 


sc 

NEU-bound 

(L 1 UL 2 )* 

2?Ti+n—1 _ -|- 1 

2?TiH-n+2 

(TinLs)* 

3/4 2”^" 

2?TinH-l 

{L 1 L 2 )* 

2 m+n-i - 2”“1 +m + l 

2?Ti+n+l 

{LfY 

2™ 

2m-|-2 


Table 18: State complexities of four combined operations and their corresponding 
NEU-bounds on regular languages [55] 


general estimation method that takes in account algebraic properties of the 
considered modelI3 

5.3.2 Approximation of State Complexity of Combined Opera¬ 
tions 

Although an estimation of the state complexity of a combined operation 
is simpler and more convenient to use, it does not show how close it is 
to the state complexity. To solve this problem, the concept of approxima¬ 
tion of state complexity was proposed by Gao and Yu |55|. The idea of 
approximation of state complexity comes from the notion of approximation 
algorithms |68l 112311124] . A large number of polynomial-time approximation 
algorithms have been proposed for many NP-complete problems, e.g. the 
traveling-salesman problem, the set-covering problem, and the subset-sum 
problem, etc. Since it is considered intractable to obtain an optimal solution 
for an NP-complete problem, near optimal solutions obtained by approxi¬ 
mation algorithms are often good enough to use in practice. Assume there 
is a maximization or a minimization problem. An approximation algorithm 
is said to have a ratio bound of p{n) if for any input of size n, the cost C of 
the solution produced by the algorithm is within a factor of p{n) of the cost 
C* of an optimal solution |45j : 

f C C*\ 

max ( —, -^ ) < pin). 

The concept of approximation of state complexity is similar to that of ap¬ 
proximation algorithms. An approximation of state complexity of an op- 

^This observation was made to us by an annonymous referee. 
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eration is a close estimation of the state complexity of the operation with 
a ratio bound showing the error range of the approximation m- In spite 
of similarities, there are some fundamental differences between an approxi¬ 
mation algorithm and approximation of state complexity. The efforts in the 
area of approximation algorithms are in designing polynomial algorithms for 
NP-complete problems such that the results of the algorithms approximate 
the optimal results whereas the efforts in approximation of state complexity 
are in searching directly for the estimations of state complexities such that 
they are within some certain ratio bounds [65]. The aim of designing an 
approximation algorithm is to transform an intractable problem into one 
that is easier to compute and the result is not optimal but still acceptable. 
In comparison, an approximation of state complexity may have two different 
effects: 


(1) it gives a reasonable estimation of a certain state complexity, with some 
bound, the exact value of which is difficult or impossible to compute; or 

(2) it gives a simpler and more comprehensible formula that approximates 
a known state complexity [66] . 


Gao et al. gave a formal definition of approximation of state complexity in 
|66j . Let ^ be a combined operation on k regular languages. Assume that 
the state complexity of ^ is 0. We say that a is an approximation of the 
state complexity of the operation ^ with the ratio bound p if, for any large 
enough positive integers ni,..., n^, which are the numbers of states of the 
DFAs that accept the argument languages of the operation, respectively, 


max 


/ a(ni,... ,nfc) 6*(ni, ...,nk) 


< p{ni ,... ,nfc). 


\0{ni ,... ,nfc) ’ a(ni, ...,nk). 

Note that in many cases, p is a constant. Some examples of approximation 


of state complexity of combined operations are shown in Table 19 


6 Conclusions 

In the last two decades, a huge amount of results were obtained on op¬ 
erational state complexity of regular languages. Results are roughly split 
between: individual and combined operations; regular and different classes 
of subregular languages; deterministic and nondeterministic complexity; dif¬ 
ferent alphabet sizes; and worst case versus average case. In general, all this 
work also suggest new directions of research and open problems. 
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Regular 


Approximation Ratio bound 

(Li U L2)* 

2^7n-\-n-\-2 

« 8 |66] 

(Lin 12)* 

27nn+l 

8/3 [66] 

(T1T2)* 

2771+71+1 

« 4 |66| 

(Lf)* 

2777+2 

4 |66| 

{Li\Rr 

2777—1 _j_ 2777—2 

1 m 

Li\R* 

2777+1 

1 m 


Table 19: Approximations of state complexities of six combined operations and 
their corresponding ratio bounds on regular languages 


As it is evident by this survey, many results on this area are functions 
parametrized by some measures, mostly the state complexities of the oper¬ 
ation arguments. Given the amount and diversity of these functions, it is 
useful to have a software tool that helps to structurally organize, visualize 
and manipulate this information. Towards this goal, a first step was taken 
by the development of DesCo, a Web-based information system for descrip- 
tional complexity results [156LI146] . DesCo keeps information about language 
classes, languages operations, models of computation, measures of complex¬ 
ity and complexity functions (both operational and transformational). For 
instance, given an operation, it is possible to obtain the complexity functions 
for all language classes and all complexity measures (that are registered in 
the database). 

To obtain a witness for a tight upper bound, many authors performed ex¬ 
periments using computer software. The reason why some witnesses would 
work for several (or almost all) complexity bounds only recently has been 
addressed. Universal witnesses (and their variants) for operational state 
complexity of regular languages can be considered a major breakthrough. 
Conditions for a family of languages to be universal include also other mea¬ 
sures as the syntactic complexity and the number of atoms. The study of 
necessary and/or sufficient conditions for the maximality of all these mea¬ 
sures is a new direction of research. Other open problems are how and 
whether this approach extends to other classes of subregular languages and 
to other complexity measures, in particular to nondeterministic state com¬ 
plexity and transition complexity. 
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Besides the worst-case complexity of an operation, researchers also stud¬ 
ied the range of possible values that can be achieved, as a function of the 
complexities of the arguments and the alphabet size. A magic value is a 
value that cannot occur (for that kind of complexity, operation and alpha¬ 
bet size). In general, if growing alphabet sizes are allowed no magic numbers 
exists (and even for binary alphabets they are rare). The distribution of pos¬ 
sible complexity values and the density of languages (or tuples of languages) 
that achieve that values can also be valuable for average-case analysis. 

Witnesses with alphabets of increasing size were used in the quest of 
magic numbers, for the state complexity of certain operations over subreg¬ 
ular languages, and almost for all results on combined operations with an 
arbitrary number of operands. This suggest the question of whether the 
alphabet size should be a parameter of the complexity under study. In par¬ 
ticular, it should be investigated which situations cannot be characterized 
without increasing alphabets, and the ones for which languages with fixed 
alphabets can exists but are not yet known. 

For many automata applications, a major direction of research is average- 
case state complexity. An essential question for average results is the prob¬ 
ability distribution that is chosen for the models. The few results that exist 
use a uniform distribution, and even in this case the problem is very difficult. 
Recently, using the framework of analytic combinatorics, some average-case 
results were obtained for the size of NFAs equivalent to a given regular 
expression mg 1011111112]. It is also worthwhile to mention the average- 
case computational complexity analysis of the Brzozowski minimization al¬ 
gorithm carried on by Felice and Nicaud |5Z1|59]. This work can be specially 
relevant for the operational state complexity because the authors give some 
characterizations of the state complexity of reversal. Another approach for 
average-case analysis is to consider experimental results based on samples 
of uniformly random generated automata. There are some random genera¬ 
tors for non-isomorphic DFAs misiisHi, but for NFAs, the fact that there is 
no generic polynomial algorithm for graph isomorphism, the problem seems 
unfeasible in general. 
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